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1.  INTRODUCTION 


In  an  early  paper,  one  of  the  authors  113]  intro¬ 
duced  a  quadratic  differential  metric  over  the  parameter 
space  of  a  parametric  family  of  probability  distributions 
and  proposed  the  geodesic  distance  induced  by  the  metric 
as  a  measure  of  dissimilarity  between  two  probability 
distributions.  This  metric  was  derived  from  heuristic 


considerations  and  it  was  expressed  in  terms  of  the  Fisher 
information  matrix  (Fisher  [6],  see  Rao  116,  pp.  329- 
332]  for  details).  Such  a  choice  of  the  matrix  for  the 
quadratic  differential  metric  was  shown  to  have  attractive 
properties  through  the  concepts  of  discrimination  and 
divergence  measures  between  probability  distributions 
([9],  [14,  15]  and  116,  p.332]).  Quite  recently,  Atkin¬ 
son  and  Mitchell  [1]  obtained  the  geodesic  distances  in¬ 
duced  by  the  metric  introduced  in  [13],  which  will  be  referred 
to  in  this  paper  as  the  information  metric,  for  a  number 
of  parametric  family  of  probability  distributions. 

In  this  paper,  we  consider  a  general  function  space 
and  study  a  metric  based  on  the  Hessian  of  the  <fr-entropy 
functional,  which  was  also  introduced  in  an  earlier  paper 
by  the  authors  [5].  A  special  choice  of  <J>  leads  to  the 
a-order  entropy  of  Havrda  and  Charv&t  [7],  and  this  gives 
rise  to  a  class  of  metrics,  which  are  called  a-order  entropy 


metrics. 


The  above  mentioned  information  metric  is  a 


2 

limiting  member  of  this  class  as  <x>l,  which  corresponds 
to  the  Shannon  entropy  [18]. 

The  geodesic  distances  induced  by  the  o-order 
entropy  metric  are  obtained  for  the  multinomial  and 
normal  distributions.  Their  relation  to  "ther  distance 
measures  due  to  Mdbius,  Poincar£,  Hellinger  and  CarthSodory 
is  examined.  The  relationship  of  the  information  metric 
to  the  Bergman  metric  will  be  discussed  elsewhere. 

We  also  extend  the  concepts  of  the  J,K,L-divergence 
measures  between  multinomial  populations  considered  in 
the  earlier  paper  [5]  to  more  general  Distributions,  and 
study  their  inter-relationships  and  convexity  properties. 

Dissimilarity  measures  between  probability  dis¬ 
tributions  play  ari  important  role  in  the  discussion  of 
problems  of  statistical  inference  and  in  practical  appli¬ 
cations  to  study  affinities  among  a  given  set  of  popu¬ 
lations.  (See  for  instance,  Matusita  [10,  11],  Pitman 
[12,  pp.  6-23],  Rao  [16,p.  352],  [17]).  This  paper  pro¬ 
vides  a  unified  approach  for  measuring  dissimilarity 
between  probability  distributions  through  distance  and 
divergence  measures  having  some  desirable  properties. 


2.  (j> -ORDER  ENTROPY  METRIC 

Throughout  this  paper,  F  denotes  a  linear  space  of 


3 


;  i 

i 


functions,  p«p(x),  x  e  X,  measurable  with  respect  to  a 
a-finite  measure  u  on  a  cr-algebra  of  the  subsets  of  X. 
The  convex  subset  of  probability  density  functions  in  F 
is  denoted  by  F^ 

F1=  {p  e  F:  |  p(x)dy(xj  =  1,  p(x)  ^0  for 
-  y  -  almost  all  x  e  X}  .' 

2 

Let  U  be  an  open  convex  subset  of  F  and  let  <f>  be  a  C  - 
function  on  an  interval  I  containing 

y{p(.x)c|R:  p  c  U,  x  e  X}  . 

For  pc  U,  we  define  the  ^-entropy  functional 


(2.1) 


H4>(p;  =  {  3dy(x)  . 


(2.2) 


The  derivative  of  H  at  p  e  0  in  the  direction 

<P 

f  €  F  is  given  by 


dVp:f)=  #t  Vp+tf) 

and  thus,  by  virtue  of  (2.2), 


t=0 


,  t  e  IR  , 


■I 


dH^(p :  f )  =  |  <*>’  [p(x)  ]f  (x)dy(x). 

The  second  derivative  at  p  e  U  along  g  e  F  is 
d2H^(p :  f ,  g)  =  |  <J>"Cp(x)  3f  (x)g(x)dy(x) 
and,  in  particular,  the  Hessian  is 


(2.3) 
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Af  H^(p)  =d2  H^(p : f , f ) 

=  f  <rCp(x)]{f(x)}2dp(x)  .  (2.4) 

We  note  that  Af  H^(p)  £ 0  f  or  every  feF  and  p  c  U,  if 
and  only  if  4>  is  concave  on  I.  This  is  equivalent  to 
the  requirement  that  be  a  concave  functional  on  U. 

We  shall  also  consider  a  parametric  family  of 
probability  density  functions  p=p(x|0)  with  x  <r  X  and 
9  =(  0  , .  .  .  ,0  )  e  ,  a  manif  o  Id  in  ]Rn  .  We  assume  that 
the  subfamily  of  F^  in  (2.1) 

F^=  (p(  •  |0)  e  F1:  0  £  fi  }  (2.5) 

is  sufficiently  smooth  in  0 e  fi  and  satisfies  the  usual 
regularity  properties,  not  explicitly  stated  to  avoid 
lengthy  discussion.  Accordingly,  we  shall  write 
n 

dp  =  dp( 0 )=  l { 90  .p  (  •  ]0) 3d0  .  ;  0  e  Q ,  p( • | 0 )  e  Fn  .  (2.6) 

i=l  J  J  “ 

Then  the  Hessian  in  (2.4)  along  a  direction  of  the 

tangent  space  of  the  parameter  space  it  is  obtained  by 

replacing  f  by  dp  in  (2.6).  Thus 

AqH^p)  =  d2{H^(p)}(0) 

»  4>"(p)[dp]2dpix) ;  p=p(x|0).  (2.7) 

JX 

In  particular,  when  <J>  is  concave  in  fR+=(0,») 

ds*'8>  ■  '4e IVp) 


(2.8) 


is  a  positive  definite  form  on  the  tangent  space,  which 
may  be  regarded  as  a  differential  metric  of  a  Kiemannian 
geometry.  This  can  also  be  written  as 

de*<8>  =  (2-9> 

k,  rn  —  l 

where 

gkm  =  gkmt0)!=“{x<J,"(P)(a0kP)®0  p)d  (x)’  P=P(X1  e)eFn-  (3.10) 

The  metric  in  ('A. 9)  and  the  matrix  [gkm)  in  (2.1U)  will 
be  called  the  <))-entropy  metric  and  the  <p-entropy  matrix 
respectively.  The  distance  between  probability  density 
functions  in  is  defined  as  the  geodesic  distance  between 
their  parameter  values  determined  by  the  metric  (2.9). 

We  shall  now  consider  some  special  choices  of  <j>. 

Vor  this  purpose,  we  define  for  ae  IR,  two  families 
and  {ip^}  of  smooth  functions  on  JF.+  : 

(a-l)-i  (x-xa)  ,  ai  1 

4>a(x)  =  ^  (3.11) 

I  -  x  log  x  ,  a  =  1 

and 

ra(a-l)l"1(  l-a+ctx-xa)  ,  a  7*0,1 
t^a(x)  =  <  l-x+ log  x  ,  a=0  (2.12) 

-x  Log  x  +x-i  ,  a  =  1 

When  the  smooth  function  <b  in  (2.2)  is  chosen  to  be  d> 

a 

of  (2.11),  we  shall  write  H  =  H .  and  In  this  way, 
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VP)  = 


\ 

! 

< 

i 

| 

V. 


(a-i)  1  r  1- f  padp(x)]  ,  af  1 
X 

p  log  p  dp(x)  ,  a  =  1 
•  X 


(2.13) 


is  the  a-order  entropy  [7],  while  the  1-order  entropy 
H  =  ,  is  the  Shannon  entropy  [18].  The  metric  (z.9) 

with  ,  afO  is  denoted  by  ds^(0)  .  In  order  that 

the  value  a  -  U  be  also  included,  we  modify  <f>  to  t|i  as 
in  (2.12).  In  this  way 

k  ,  rp=  1 

with 

gkm)=  4m)(0)=  f  p0t(99  loe  P  H  90  log  p)  dy(x) ,  p=p(x|e ) e  . 

JX  k  m  ,o 


We  call  (2.14),  the  a-order  entropy  metric  and  the  matrix 

[gj^]  in  (2.15),  the  a-order  entropy  matrix.  The  geo- 

2 

desic  pseudo  distance  induced  by  ds^jB)  is  denoted  by  Sa 
and  is  called  the  a-order  entropy  pseudo  distance. 

In  the  special  case  of  a  =1  ,  corresponding  to  the 
Shannon  entropy  wnich  is  widely  used  in  applied  research, 
we  have  (dropping  the  suffix  a=l) 

iAl  =  j  'taV.  l2-16) 

k ,  m=  1 


and  with 


g 


km 


=gta»(e)' 


[  p06  logp)0.  log  p  )dp(x),  p»p(x|e>€  F„. 
i  X  °k  m 


(2.17) 
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The  expression  (2.16)  is  the  information  metric 

C13]  mentioned  in  the  introduction  while  [g^  ]  is  the 

Fisher  information  matrix.  The  geodesic  pseudo-distance 
2 

S  induced  by  d  s  (0)  will  be  called  the  information  pseudo¬ 
distance  ( a  pseudo-distance  satisfies  ail  the  postulates 
of  distance  except  that  it  may  vanish  for  elements  which 
are  distinct )  . 


3.  THE  J,  K,  L-DIVERGENCE  MEASURES 


3 . 1  Definitions  and  inter-relationships 

We  consider  the  convex  subset  U  of  F,  the  function  <j> 
on  the  interval  I  and  the  <})-entropy  functional  as  defined 
in  (2.2).  For  p.qeU,  the  J-divergence  (with  respect  to  H^) 
is  defined  to  be  the  Jensen  difference 


Vp’q)  =  2  "*<e52)  -  Vp)  -  Vq) 


which  can  be  written  in  the  explicit  form 
Jj,(p.q)  =  j  {2<t>(£^3)_  <t>(p)  -4>(q)  }dy(x)  ,  p=p(x) ,q=q(x)eU. 


(3.1) 


We  also  consider  other  measures  of  divergence,  special 
forms  of  which  have  received  numerous  practical  applications: 
The  K-divergence 

K(J)(p,q)=j  (p-q)rp"I4>(p)-q-i<Kq)  Idy(x)  (3.2) 
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and  the  L-divergence 

[  (p  4>(^)+q  <t>(£)  )<iu(x)  .  (3. 3) 

<P  J  ^  P  4 

The  following  theorem  gives  some  results  concern¬ 
ing  the  J,  K,  L-divergences  and  their  inter-relationships. 


Theorem  1 .  The  following  hold: 


(i)  If  4>  is  concave  on  1R+,  then  J^(p,q)>^0  for  p,q  e  F^  . 

(ii)  If  F(x)  =  x  4>  ( x~  ^  )  +  $(  x  )  is  non-positive  on  e+,then 
for  P.q  e  Fi- 

(iii)  If  t|>(x)  =  <p(x)/x  is  decreasing  on  JR+,  then  K^(p,q)>^0 
for  p , q  c  F1> 

(iv)  If  is  decreasing  and  convex  on  ®l+ ,  then  K^Cp.q)^ 
«MP.q)  for  p ,  q  e  Fj  . 

(v)  If  <(>  is  concave  and  ip  is  convex  on  |R+,  then 
K^P.q)  I  0  for  P.qe  Pi- 

Proof.  Items  (i)  and  (iii)  are  trivial.  As  for  item  (ii), 
we  have 


Vp’q) 


|  qF(^)dy(x) 


P,Q  e  F1> 


and  item  (ii)  follows.  We  now  prove  item  (iv).  From 
(3.1)  and  (3  2),  we  have 


where 


K^(p,q)  -  J(j)(  P  ,  q)  = 


j  G( p , q)dp(x) 
X 


G(x,y)  =  ^  <{>(x)+  ^  <Ky)-2  4»[  (x+y)/2  ]  ;  x,y  e  »R+  . 


This  may  be  written  as 


SliLiL)  =_X_lHx)+  -4-  H y) -<K*±Z) 

x+y  x+y  x+y  Y  J  Y  2 
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2ij>'(x)=-xi|/"(x)+4>"(x)<0  ,  xe  H 

This  concludes  the  proof. 

When  the  function  <p  is  replaced  by  <J>a  of  (2.11)  the 

resulting  divergences  J  ,  K,  and  L.  will  be  called  the 

'’’a  a 

"a-order  J,  K  and  L  divergences"  and  they  will  be  denoted 
by  J^,  Ka  and  L>a  respectively.  As  in  the  case  of  the 
a-order  entropy  H^,  the  index  a=l  will  be  dropped  from 
these  divergences  and,  thus,  J=J1>  K=K^  and  L=L^.  fbr 
p,qe  ,  the  explicit  expressions  of  J^,  and  are  as 
fo  Hows  : 
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(a-1) 


-1 


[pa+qa-21“a(p+q)a]ay(x) 


,  a  f  1 


Ja(p.q)  =<i 


(3.4) 


[plog  p+qlogq-(p+q)log{(p+q)/2}]dy(x)  ,  a  =  l  , 


Ka(p,q)=<; 


( a-1 ) 

r 


-1 


(p-q)(pa-1-qa_1 )dy(x) 


Jx 


(P-q ) ( log p  -log  q)dy(x) 


La(p.q)=<! 


(a-1) 

r 

x 


(p1-aq0l+q1  apa)dy(x)-2  ] 


j  [q  log(p  1q)+plog(q  1p)]dy(x) 


,  a  f  1 
,  a  =  1* 

.  a  i  1 

.  a  =  1. 


(3.5) 


(3.6) 


We  note  that  for  a=  1, 

K(p,q)=L(p,q)=|  (,p-q)(  log  p-log  q)dy(x)  ;  p,q<r 

which  is  the  familiar  Jef f reys-Kuilback-Leibler  divergence. 

In  this  connection,  we  also  mention  the  a-order  Hel linger 
psuedo  distance 

M  (p , q)=2 |a  j -1 [ [  (pa/2-qa/2)2dy(x) ]1/2  (3.7) 

The  special  case  of  (3.7)  when  a =  1 ,  M(p  ,q )=M^(p , q ) , 
has  been  extensively  studied  by  Matusita  [10,11]  and 
recently  discussed  by  Pitman  [12,  pp.  6-23 J  from  the  point 
of  view  of  statistical  inference. 

The  following  corollary  is  a  consequence  of  Theorem  1: 
Corollary  1.  Let  a^O.  Then,  for  p,q<r  F^: 

(i)  Ja(p-q)2°  ; 

(ii)  Ka(p,q)>0  ; 
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(  iii) 

ba(p,q)>°  ; 

( iv) 

Ka(p,q)  2.  Ja(p,q)  1  °,  provided 

0<_ol<_2; 

(v) 

K(p,q)=I<p,q)>_  J(p,q)  . 

Proof . 

We  consider  4>  of  (2.11)  and 
a 

define 

Fa(x)=xDa(x-1)+$a(x)  ,  ij>ot(x)=x“1<J>alx)  ;  xelR+. 

Since  the  cases  with  a  =  1  are  limiting  cases  of  af  1  as 
a-*-l,  we  may  assume  that  a^l,  a  >  0.  In  this  case 

<j>a(x)  =  ( a-3  )-1(x-xa)  ,  \|>Q  (x)  =  ( a-1 )  1(l-xa  2), 

F  (x)  =  (  a-1  )-1x1-Ct(  l-xa)  (xa-1-l  )  . 


Since 

<J>a"(x;=  -  axa~2<  0,  Fa(x)<0,  ^a’(x)=  -xa-2<  0, 

items  (i)-(iii)  follow  from  items  (i)-(iii)  of  Theorem  1. 

a-3 

Also,  since  ipa"  (x  )  =  (2~a)x  ,  item  (iv)  follows  from 

item  ( v )  of  Theorem  i.  Finally,  (v)  follows  from  (iv)  and 
(3.5)-(3.6). 

it  is  worth  pointing  out  that  the  divergence  measures 
(3.l)-(3.3)  based  on  the  general  <f>  and  (3.4)-(3.7)  based 
on  the  a-order  entropy  can  be  used  to  generate  a  metric 
in  the  parameter  space  defining  the  probability  distri¬ 
butions  by  considering  two  continguous  distributions.  This 
is  easily  done  by  considering  the  Hessian  along  the 
tangent  space  of  F  namely  when  p=p( • | 6 )  and  q-vp.  The 
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precise  results  are  as  follows: 


(  i)  d2{  J^p.p) }(  0  )=  ~\ 


<T(p)rdp(6)  rdp(x) 


-  2  d%(6) 


which  is  the  <j>~entropy  metric  defined  in  (2.9). 

(ii)  d2  {K  (p,p) }(  0)  =  -2  [  Cp'1^  (P)]'[dpte)]2dy(x), 
<P  Jx 

i  iii)  d2(L.(p,p)}(0)  =  -2  4>"(lj  f  p-1[ap( 0) ]2 dp(x) 

Jx 

=  -2  $"(  1  )d  s  2(  0 ) 


where  dsi!(0)  is  the  information  metric  as  in  (2.16),  so 
that  when  <j>"(l)<0,  this  metric  is  essentially  the  infor¬ 
mation  metric. 


Further  when  <fi  =  <ji  as  in  (2.11),  we  have 

(iv)  a2fJa(p,p)}(0)  =  |  <3s2(8), 

(v)  d2{Ka(p,p)}(0)  =  2  dsj^e)  , 

(vi)  a2  {La(p,p; }  (  0  )  =  a  d  s  2(,  8)  , 

2  9 

(vii)  d2{M2(p,p;}(8)  =  ds2(0)  , 

where  d  ^(6)  is  the  a-order  entropy  metric.  The  relations 
(i)-(vii)  reflect  the  local  properties  of  the  J,  K,  L, 
M-divergence  measures.  We  shall  now  consider  their  global 
properties  in  terms  of  their  convexity  as  functions  on 
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3.2  The  J-divergence 

We  compute  the  Hessian  of  at  (p,q)eUxU  along 
(f ,g)eFxF; 


A(f  ,g)Vp’q,=d  J<{,  ,g)3 


By  virtue  of  (2.4),  we  deduce  that 


\f,e>Vp'q)=- 

where 


(a(p,q)f2+  b(p,q)fg+a(q ,p)g2}du(x) 


b(p,q)  =  —  2  4>"C£(p+q)l 


(3.8) 


(3.9) 


and 


a(p,q)  =  <|>"(p)+b(p,q) 


(3.10) 


We  therefore  conclude  that  J,  is  convex  (concave)  on 

<J> 

U*U  if  and  only  if  a(p,q)_<0  (a(p,)>0)  and 

d(p , q)  = a(p,q)a(q ,p)-[bip ,q) ]Z> 0  .  l3.11) 

From  (3 . 9)-(3 . 11) ,  we  find  that 

a(p,q)--2»"(p)»"^ip+q)H?nf?y-2^u}p-+q)j  ) 

and 


d(p ,  q )=  -  24.”(p)4»"(q)^”[J(p+q)){^r^y  + 


1  o___l _  1 

<r(q)  *"[i(p+q)r 


aince  the  expression  in  tne  last  curly  bracket  is  the  Jensen- 
difference  (or  the  J-divergence)  of  (<f>")~^ 

(see  also  [5]) : 


we  conclude 
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Theorem  2 .  «T^(p,q)  is  convex  (.concave)  on  U*U  (with 

respect  to  FxF=>UxU  )  if  and  only  if  $  is  concave  (convex) 
and  (<}>")-*  is  convex  (concave)  on  I . 

As  a  corollary  of  this  theorem  we  obtain  the 
following  result  on  J^lp.q)  of  (3.4): 

Corollary  2.  Ja(p,q)  is  never  concave  on  F^xF^.  **  *s 
convex  on  FjX  F  if  and  only  if  at  [1,2]. 

Proof .  The  case  of  a  =  0  is  degenerate  for  J0(p,q)=0.  We 

therefore  assume  that  ot^O.  Also,  since  the  case  a=t  is 

a  Limiting  case  of  a-*-l ,  we  may  also  assume  that  a=/l.  From 

o 

(2.11)  we  deduce  that  <j>a"(x)=-ctx  for  x  e  JR+,  while  for 

f  (x)  =  F4>  "(x))  1  we  have 
a  cl 

f  "(x)  =  a-1(a-l)(2-a)x’'a  ,  x  e  R+  . 

The  result  follows  at  once. 

For  the  proof  of  the  following  corollary  we  refer 
the  reader  to  [  5]: 

Corollary  3.  Assume  that  U  is  an  open  convex  subset  of  F, 
such  that 


U(p(x)e  1R:  peU,  xeX  }=!  =  (  0, 1 ) . 


Let  f a(x;  =  4>o((x)  +  4>aLl-x)  ,  x  e  I  where  $  is  given  by 

(2.11).  Then  J.  (p,q)  is  never  concave  on  UxU.  It  is 
1  a 

convex  on  UxU  if  and  only  if  a  e  [1,2]  or  a  £  [3,11/3]. 
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When  $  is  of  class  C  on  the  interval  I,  the 
condition  of  Theorem  2  may  be  summarized  as  one 
single  condition,  namely  that  the  matrix 


M^(x) 


>(2)(x)  ZSV3)(x) 


/?d»(3)(x;  4>l4\x) 


be  negative  (or  positive)  definite  for  all  xc  I.  This 

means  that  <J>^2\x)<0  (or  <}/2^(x)>0)  and  A,(x)=aet{M  (xj) 

-  -  <f>  <P 

> ■  0,  for  all  xe  I.  This  condition  may  serve  to  single 
out  <p^(x)  and  ^(x)  of  (2.11)  and  therefore,  the  entropies 
H^(p)  and  H9(pj  of  (.2.13).  Indeed,  the  following  hold: 


Theorem  3.  The  general  solution  of 


A^(x)=det  (M<J)(x)  }  =0  ,  <Jr  '(x)>0  ;  x  e  K+  , 


is  one  of  the  following  two  forms: 


0(x;=— =-  [  (cx+b)log(cx+b)-cx]+dx+e 
c 

where  c,b,d  and  e  are  constants  with  c>0  and  b>0,  or 


(j.lz) 


<p(x')  =  ax  +kx+r 


where  a,  k  and  r  are  constants  with  a>0.  In  particular 
<p(x)  =  -  <P1(x)=x  log  x  is  the  only  solution  of  (3.12) 
subject  to  the  conditions: 

<J>(1)=0  ,  $(i)(l)=l  ,  <pl2\l)  =  l  ,  /3\i)=_i. 
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Similarly,  (Hx)=  -  i.(x)=x  -x  is  the  only  solution  of 
(3.12)  subject  to  the  conditions: 

♦  (1H  ,  <f>(1)(i)=i  ,  <|>(2)U)=2. 


Proof .  When  <j>^  (x  )=a=const .  ,  the  second  form  is  ob¬ 
tained.  When  <|>^(x)?const.  ,  we  let  f  (x)  =  [<^2\x)  J-1 


Then 


f'  ;(x)  =  - 


[V  '(x)r 


A  (x ) 
<P 


and  so  f^2^(x)=0,  which  means  )^2^=0.  The 

result  follows  now  at  once. 

3 . 3  The  K-divergence 

As  for  the  Hessian  at  (p,q)e  F^x  F^  along 
(f ,g)  c  F  x  f  we  have  by  virtue  of  (3.2) 

&.  .K(p,q)  =  -  f  {a(p,q)f2+2b(p,q)fg  +  a(q,p)g2>dp(x)  (3.13) 

V  I  I  S  /  J  y, 

where,  for  x  ,y  e  IR+ , 


a(x.y)  =  b"(x)-yi(/"(.x)  ,  ip(x )=<K x )  /x  , 


(d.14) 


b(x,y)=-  rw' (x)+w' (y) 


It  therefore  follows  that  K^(p,q)  is  convex  on  F^*  F^ 
if  and  only  if  a(x,y)<0  and 


d(x,y)Ha(x,y)a(y,x)-rb(x,y)]  >0  ;x,ye  IR+ 


(3.15) 
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However,  from  (3.14)  it  is  seen  that  aix,y)  <^0  whenever 
<J>  is  concave  and  tp  is  convex  on  H+  ,  a  situation  identical 
with  that  of  Theorem  l(v).  We  clearly  have: 

Theorem  4.  K^(p,q)  is  convex  on  F^  iwith  respect  to 

F*F)  if  <p  is  concave,  ip  is  convex  on  1R+  and  (3.15)  holds. 

As  a  corollary  we  obtain  the  following  result  on 
^a(p,q)  of  (3.5),  the  proof  of  which  is  to  be  found  in 
C  5  3 : 

Corollary  4.  Ka(p,q)  is  convex  on  F  xFAfar  all  ae  [1,2], 

3 . 4  The  b-divergence 

The  Hessian  of  at  (p,q)e  along  (f,g)e  F*F 

is,  in  view  of  (3.3), 

A(f  ,gJ  I^P,Q>=  -  j^{a(p,q)f2+2b(p,q)fg+a(q,p;g2}du(x)  (j.16) 

where 

1  2 

aip,q)=  -  <P"(fl+  (3.17) 

q  q  p  p 

and 

P(P,q)= 

^  q  p*  p 
In  this  case,  the  discriminant 

d(x,y)=a(x,y )a(y ,x)-[b(x,y)]2 

is  identically  zero  on  IR+x  R+  .  This  leads  to  the  fol¬ 
lowing  result  (see  also  [5];: 


Theorem  5.  L^p.q)  is  convex  (concave)  on  F^F^  (with 
respect  to  Fxf-)  if  and  only  if  the  function  F(x)sx<j>(x-1 ) 
+  <Hx)  is  concave  (convex)  on  R+. 


Proof .  since  d(x,y)nO  on  lft+x  lft+  ,  L^p.q)  is  convex  on 
F^xF^  if  and  only  if 
2  3 


a(x,y)=^j  4>"(^)  +  <J>"(|)}<  0  ;  (x,y)e 

x  y  y 

Putting  t=y/x,  this  condition  becomes 


t  3<p"(t-1)+(J>"(t)_<0  ;  t  e  P+ 


This  means  that  F"(t)<  0  and  the  result  follows. 


In  particular,  for  L^lp.q)  of  (3.6)  we  have: 

Corollary  5.  L^fp.q)  is  convex  and  (concave)  on  F^F^ 
for  ail  a^O  (or  ail  ot£0). 

Proof .  The  case  a=0  is  degenerate  for  then  KQ(p,q)HO. 
Also,  by  continuity  we  may  assume  that  a?l.  Fl’om  (2.11) 
ana  Fa(x)  =x<J>a(x_1)+(()a(x)  ,  we  deduce  that 

F"(x)= -a(xa-2+x_Ct_'1' )  ;  xe  1R , 

a  + 

and  the  result  follows. 


4.  geodesic  distances 


We  return  to  the  a-order  entropy  metric  in  (2.14)- 


19 


(2. Id).  The  empnasis  of  the  subsequent  analysis  will  be 

in  finding  the  a-order  entropy  pseudo-distance  S  for 

a 

known  mult iparametr ic  families  of  probability  distri¬ 
butions  when  a=l  ,  such  an  analysis  was  carried  out  by 

Rao  [l3]  and  more  recently  by  Atkinson  and  Mitchell  [i], 
where  the  distance  S  is  explicitly  evaluated  for  certain 
multiparametr ic  families  F^.  We  shall  not  repeat  the 
examples  of  Cl,i3J  for  their  extensions  to  the  case  of 
afl  is  not  particularly  difficult.  An  exception  will 
be  made  for  families  of  normal  distributions,  where  it 
seems  that  the  present  analysis  is  slightly  more  general 
ana,  perhaps,  simpler  than  that  found  in  [1,13]. 

2 

Being  the  geodesic  pseudo-distance  induced  by  dsa(b) 
of  (2. l4)-(2. 15) ,  5^  may  be  evaluated  with  the  aid  of 
the  Euler-Lagrange  equations  which  involve  the  Christoffel 
symbols  based  on  the  a-order  entropy  matrix  Cg^\0)]  of 
(2.1b).  in  general,  such  an  undertaking  may  prove  diffi¬ 
cult  as  far  as  an  explicit  closed  expression  for  is 
sought . 

4 . 1  Multinomial  Distributions 

Consider  a  multinomial  discrete  distribution 

p(x| 9)=p(x|ei , . . .  ,6n)  where  the  sample  space  X  is  the 

set  of  integers  X=X  ={1 , 2  ,  . . .  ,n}  and  p(k | 0 )=0  for  K  e  X  . 

i*  jk  n 

In  this  case,  g£“^  of  (2.1b)  is 
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we  may  use  the  iaentif ication 


PkHHp(K|O)  =  0K  ;  k=l . n, 


We  shall  assume  first  that 


p^pl . V* 


and  then  make  the  restriction  of  p  e  Q  ,  wnere 

n 

n 

un  =  {pe  k":  l  PR=1,  0<Pk<l,  k=l,  .  .  .  ,n}  . 
k —  1 


With  these  considerations  the  metric  of  (2.14)  may 
be  expressed  as 

ds^(p)=  l  p£_Z«iPk)*  ,p£p;. 
k  1 

The  fundamental  tensor  of  the  metric  is 

of  rank  n  and.  therefore,  S  is  indeed  a  distance.  The 
evaluation  of  this  geodesic  distance  is  immediate,  and, 


for  p,q  e  1R+  ,  we 


have 


|  a  |  _1  {  l  rp“/2-q“/2]^}1/2  ’  a*° 

I  k=1  K  K 

{  l  ClogP  -log  qk]2}1/2  ,  a=0 
k=l  K 


which  is  (modulo  a  factor  of  /2)  the  a-order  Heliinger 

distance  M^fp.q)  as  in  (3.7).  The  same  results  hold 

with  the  restriction  of  p,q  e  . 

1  M  n 


4.2  Normal  Distributions 


We  first  consider  a  two-parameter  family  of  normal 
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2 

distributions  p( • | n , a) =N( p , o  )  with  mean  u  and 
variance  cr2( <  p<°°;  a>0) .  Here,  for  reasons  of  con¬ 
vergence  we  must  assume  that  a>u.  Fixing  a>0,  it 
will  be  found  convenient  to  introduce  new  variables 
x  and  y  (-°°<x<°°  ;  y>0;  via 

y=a  ,  x={A(a)}~^u  ;  A(  a)  5(a“-a"^)2+2a  * ,  a  >  0  • 

We  may  consider  the  complex  parameter 
z=x+iyeW=U={ze  (T  :lmz>0} 

with  U  being  the  upper  half-plane.  m  this  way 

2 

p(  •  |  V,  o)  is  replaced  by  p  (  •  |z) eN( y , a  )  with  z  e  U  as 
in  (4.2)-(4.3).  Now,  a  routine  calculation,  omitted 
here,  shows  that  the  metric  (2. 14  lb)  admits  the 

form 

dsa^z)=B^°Oy~(a+1)  |  dz  |  2 

where  - 

1-a 

B(a)Ha  3//2(27t)  2  A(a;  ,  a>0. 

The  metric  in  (4.3)  constitutes  a  Kahier  metric  on 
the  upper  half-plane  U  and  when  a=l,  it  reduces  to  the 
familiar  Poincare  metric .  The  Gaussian  curvature  of  (.4.: 
is 

k^z)=  — (ot+1 )  (2B(a)  }-1y0t_1  ;  y=Imz>o  ,  a>0, 


(4.1  ) 


(4.2  ) 


(4.3) 


(4.4) 


) 
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and,  is  always  negative.  In  particular,  k  (zj=-2“  . 

In  this  case,  8  is  indeed  a  distance  on  U  and  S=S. 

is  the  familiar  hyperbolic  distance  of  U. 

We  now  treat  this  distance  S  (a>0): 

a 

1.  The  case  of  q=l :  in  this  case,  by  (4.1)-(4.4), 

ds2(z)=2y-2 jdz |2.  1 4 . 5) 

Elementary  arguments  based  on  the  invariance  properties 
of  this  metric  of  Poincare  lead  to  the  following  geodesic 
distance  (or  "Poincare  distance"): 

skews'  jog  :  <4-6) 

where 

«(z.O=  ^  ;  z,CeU.  (4.7) 

z  - 

It  should  be  noted  that  6  =  6 ( z,C)  is  also  a  distance  on 
U  and  is  called  the  "Mobius  distance"  (see  also  [3, 4] 
for  further  generalizations  of  these  distances).  Also, 
the  geodesics  of  (4.5)  (see  for  example,  (4.16))  are 
given  by  the  "semi-circles" 

z=a+re^  ;  r>0  ,  O<0<u,  (4.8) 

where  a  is  a  real  fixed  constant. 

Expressed  in  terms  of  the  original  parameters  p 
and  o,  the  distance  in  (4.6),  by  virtue  of  (4.1)  and 
(4.7  ),  may  be  written  as 
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S(  Uj  ,  ,  ^2  >  °2  ^ 

where 

(*(y^i^-^»y2»o2) 


/2  log 


l+6( Uj .Oj ;u2,o2) 
l-6(  Uj  ,  ;  u2  ,  o2) 


( u1-u2)2+2( Oj-Og ) 
( u1~y2)2+2( c^+Og) 


2-|4 
2 


(4.9) 


(4.  10) 


is  the  Mobius  distance  (4.7)  in  terms  of  y  and  a.  This 
is  the  required  distance  between  N(y^,Oj)  and  N(y2>o2). 
It  agrees  with  a  rather  more  involved  expression  obtained 
by  Atkinson  and  Mitchell  [1],  The  expression  in  [1]  can 
be  obtained  from  (4.  9)  by  using  (4.1)  ,  (4.8)  and  (4.10) 
note  that  always 


0  <  6(y1,o1;  y2,o2)  <  1. 

On  the  other  hand,  the  Poincare  distance  i > °i ^ 2 ’ °2 ^ 

clearly  satisfies 


S(u1,a1;  u2  ,  c2 )  >  2  /2.  6(y  ,  a  ;v2>  a2')  • 

The  Hellinger  pseudo-distance  (3.7)  between 
2  2 

N(Uj,o1)  and  N(y2,o2)  is,  in  this  case,  a  proper 
distance  with  the  following  form: 


M(  y. 


y0 , a0 )  =  2 


f2ala2' 

1 “ 

2 ,  2 

1 

,CTl  +  a2  J 

,  \2  ,  2  ,  2 . 
-(y1-y2)  /4(a1+a2) 


li 


(4.11) 


.  The  case  of  a  f  1  In  this  case,  the  geodesic  distance 
of  the  metric  (.4.3)  is  not  easily  explicated  as  in 
the  former  case.  We  shall  first  find  all  the  geodesics 
of  this  metric.  This  may  be, of  course, done  with  the  aid 
of  the  christoffel  symbols  of  tr.e  metric  (4.3).  We  shall, 
however,  proceed  directly,  for  reasons  of  economy  and 
clarity.  Writing 

B  =  (ct+l)/2  ;  lol/2,  (4.12) 

finding  the  geodesics  of  (4.3)  amounts  to  solving  the 
following  extremal  problem  of  calculus  of  variations 
(the  factor  H(ot)>0  is  irrelevant  here!;: 

mm  f  y  +(y  '  )2  dx  ,  yX), 

J  a 

where  the  minimum  is  taken  over  all  C^-paths  y=f(x;, 
joining  the  points  (a,f(a;)  and  (b,f(b)).  A  routine 
calculation  based  on  the  Lagrangian  of  y"^/l+(y')2  shows 
that  the  Eu  ler-bangr ange  equations  of  this  problem 
admit  the  simple  form 

yy"=  -BCl  +  (  y  '  ;2].  (4.13) 

In  order  to  solve  (4.13)  we  proceed  with  standard 


methods,  letting 
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to  obtain 


-6d(  log  y  )  =  2dfiog(l+p  )] 


This  shows  that 


y'6=r~1(l+p2)i  ;  r>0  , 


=  ■  r>y,i>0' 


with  r  >0  being  a  constant.  Consequently, 

i. 

-y 

where  a  is  an  arbitrary  constant  of  integration.  We 
may  use  the  substitution  y=r^^sin^^ ,  0<9<ir  and  upon 
introducing  the  one  parameter  family  of  functions 


F  (0) 
Y 


-I 


0 


TT /2 


sinYtdt  ;  y  e  R  ,  O<0<ir, 


(4.14) 


(4.15) 


the  solution  (4.14)  may  be  written  in  the  parametric 
form: 

x=a±r1|^F1^gi  6)  ,  y=r1^sin1^9  ;r>0  ,  0<6<tt.  (4.16) 

When  g=lt  or,  by  (.4.12),  when  a=l  ,  (.4.16)  reduces  to 
(4.8)  Equation  (4.16)  gives  all  the  geodesics  of  the 
problem.  We  also  note  that  the  goedesics  in  (4 . 16 )  include 
the  lines  x=const.  as  a  limiting  case,  corresponding  to 
r  -*50 . 

An  expression  for  Sa(z,£),  z,(f  U,  may  now  be  given 
by  using  (4.3)  and  (4.16).  We  have 


I-1 

s^z.twBTS)  fgZY 


F1  2(92)‘Fl  2 

B"z  B 


(e1) 


(4.17) 
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where,  after  choosing,  without  loss,  the  (+)  sign 
in  (4.16), 

z=x+iy  ,  x^a+r1  ^F.,  /R(  0-  )  ,  y^^sin1/^- , 

/fS  (4.18) 

C=C+i  n  ,  C=a+r'L/eF1^ts(02)  ,  n=r X//6sin  1^«2  • 

Using  (4.1),  (4.4),  (4.12)  and  (4.15)  one  deduces 
immediately  that  (4 .17) reduces  to  (4.6>  when  a=l.  In 
general  the  quantities  «^,02  anc*  r  are  determined  by 
the  given  z=x+iy,  C=£+in e  U  via  (4.18).  However,  except 
for  special  values  of  a  >0  where  integrals  of  type  (4.15) 
can  be  further  explicated,  finding  a  closed  form  formula 
for  Sa(z,t;)  in  terms  of  z  and  £  may  prove  difficult. 

One  may  use  an  alternative  expression  for  Sa(z,c) 
which,  sometimes,  is  simpler  than  that  of  (4.18).  It 
is  based  on  the  recursive  formula 

FY_2^0)  =  yEx  "  COS0  SinY_l03> 

valid  for  all  real  y  and  easily  derived  from  (4.15). 

Using  this  formula,  together  with  (4.12)  and  (4.18), 

(4.17)  becomes 

Sa(z)-2£EL-  *z£  +yi(1-a\i.r-2ya*1)i-T)i(1-a)(l-r'2na+1)i 
|l-a! 

(4.23) 

Letting  r-*»  in  (4.19),  corresponds  to  the  geodesic 
x=const.,  and,  accordingly 
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S  (z) 
or  7 


2/BUO 

Tl-a| 


A(l-a)  J(l-a) 
y  -n 


;  z , £cU ,  Rez=Re£  . 


This,  of  course,  agrees  with  t.4.6)  as  a-M  and  Rez=Re£. 

The  a-order  distance  3  (y  ,o^;yn,a.  )  between  N(y  ,of) 

a  i  12’  ^  Hi ’  1 ' 

2 

and  N(\i2>a2^  can  be  derived  from  (4.19)  by  using  (4.1), 
(4.4)  and  (4.18).  In  particular, 


Sa(ii,oi;y,o2) 


_  2/BTaT 
I  l-o  I 


i(l-a)  rt|(l-a) 
1  ~°2 


which  agrees  with  (  4.9)  as  a-^i  ana  u=y^=U2- 

The  a-order  Heliinger  distance  between  N(y^,a^) 
and  N(y2>a2^  is  now 

1  -a  1-a  1-q 

Mq(lll’al;iJ2’a2)=~T74  (2it)  {*°1  -°Z  )- 

a 

1-a 

+af  W  *  Eq(Ml*°l;  V°2^ 


where 


E  fu  a  -u  a  i-i  |2al°^|  p  -a(y,-y2)‘J/'l(a>a;) 

«'W1'  1,P2»°2>  1_  1  2+  2j  e 


2.2, 


Va2 


When  o*.l,  this  formula  reduces  to  (^.11). 

4.3  Products  of  Normal  Distributions 

The  previous  methods  can  be  extended  to  products 
of  normal  distributions 

,  n  2 
p(x  I  «)  =  IT  N(xk  :  uk,  dk)  , 

k—  1 


(4.20) 
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where  x-^,...  xn)*X=  fin  ana  0»U1.o1, ...  ,wn,on)f  lR^n 
with  means  \iy,  and  variances  ak(-eo<pk<«,  0jt  >  0;  k=l,...,n). 

As  in  (4.1)-(4.2),  we  find  it  convenient  to  introduce 
new  variables.  Accordingly,  we  replace  x“(xlf . . . ,xn)  by 
.  ,tn)cx=  <Rn  and  write  for  the  parameters 

yk“0k'  xir{A(oi)  >_iMk  ;  AiaJsCa^-a"*)2**®"1  ,  a>0  ,  i4.21) 

and 

z=(zl*  •  .  .  ,zn)  ,  zk“xK+iyk  »  (-0o<xk<®;yk>0);k=l,  .  .  .  ,n.  (4.22) 

Plainly,  we  view  the  distribution  in  (<1.20)  as  p(t|z) 
with  t  in  the  sample  space  X  and  z€Un,  n  copies  of  the 
upper  half-plane  U. 

As  in  (4.3)  the  metric  (2 . 14 )-(z . 15)  admits  here 
the  form 

dsa<z)-Bn<“><  kl1yk2|dzk|2  (4.23, 

where 

Bn(a,).o.-(D+2,/2(2,)i”(4-<.)A((l)  _  a>0  (,.24) 

TThen  n=l ,  (4 . 23)-  (4 . 24)  reduce,  of  course,  to  (^.4)- 
(4.5).  The  case  of  a«=l  is,  as  before,  rather  involved 
and  since  we  cannot  expect  a  closed  form  formula  for  the 
geodesic  distance  Sa,  we  shall  only  deal  with  the  case 
of  a*l.  In  this  case,  by  (4.21)-(,4.24) , 


(«.25) 
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which  is,  as  in  (4.5),  the  Poincare  metric  on  Un. 

In  order  to  find  the  geodesic  distance  S  we  exploit 
the  fact  that  the  metric  (4.2m)  is  (globally)  invariant 
under  biholomorpnic  mappings.  Accordingly,  we  use  the 
mapping 

zk_i 

YzTT  *  zkcU  *  xlkln  (.4.26) 

k 

which  maps  Un  biholomorphicaiiy  onto  the  poiydisk 
Dn={o)=(u)if  .  .  .  ,a)n)e  Cn:|wk|<l,  k=l ,  —  ,n}.  With  this 
mapping  th°  metric  in  (4.25)  becomes 

ds2(  a)  )=8  l  (1-  jo>  t2)'2|dm .  |2  (4.27) 

k=i  K  K 

which  is  the  Poincare  metric  on  the  polydisk  Dn.  ’Ve 
first  find  the  geodesic  distance  S((i>,t)  of  this  metric 
when  oj ,x  e Dn  ,  In  order  to  do  so  we  assume  that  r  =0=(.0 ,  .  .  .  ,0) 
and  evaluate  S(u>,0)  ,to  e  Dn.  We  write 


r=(r1, . . .  ,r  )  ,  rR=  N  k | ,  CKrk<l  ;  K=l, . . . ,n 


and  note  that  due  to  the  invariance  of  (4.27),  S(<x>,0i 
=S(r,0).  in  this  way,  we  have 


ds2(rj=8  2, 
k=l 


dr  1 

2  n 

'  1+rkl 

K 

2 

=  2  l 

dlog - - 

U-r^j 

k=l 

H 

1 

•"1 

vt 

2 


and  consequently 


S  (to  ,0)=/z 


’  n  2  1+lwkl 

Ji,og  W 


This,  as  is  well  known,  is  sufficient  for  the  determination 
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of  the  distance  between  any  two  points  of  Dn.  indeed, 
given  two  points  (o, TeDn  there  exists  a  hoiomorphic 
automorphism  <J>  of  un  on  Dn  so  that  4>(uj)=v  ,  <K  T)=0cDn . 
Again,  by  invariance,  S( co,  r  )=S(  4>( u0  ,  <K T)  )=S(  v,0)  .  Here, 
up  to  a  rotation 


_  wk  ~Tk 

K  1-T  to 

k  k 

It  therefore  follows  that 


k=l , . .  .  ,n. 


S(to  ,  t)=/2' 


r-  n 


l  log" 


k=l 


w<VV- 


;  o>  ,  t  eD 


n 


(4. 28) 


where 


(0  -T 

<5(u>K,Tk;=  — = — — 

K  K  1_T  (0 

k  k 


k=i , . . .  ,n. 


(4. 29) 


Returning  to  the  metric  in  (.4.25),  its  geodesic 
distance  S(z,C)  between  two  points  z ,CeUn  is  obtained 
from  (4. 2P)-(4. 29)  and  the  mapping  in  (4.26).  This  gives 


S(z,  C)=*/z  {  l  log 
k=l 


n  2i+6(zk'Ck)^  r 

^  K  ;  z,CeUr 


(4.30) 


witn 


<(W" 


zk-^k 


zk-ck 


,  k  1 , . .  .  ,n  » 


14.31) 


This  generalizes  (4 .7  )-( 4  .  »)  .  Finally,  from  (4 . 30)-(4 . 31 ) 

n  2 

the  information  distance  Sn(M,a;v,p)  between  a  H  N(t^ :  ,  a^) 

n  2  ^  1 

distribution  and  a  II  N(tk:  P^)  distribution  is  given  by 

k=l 


S  (M.cr  Jv,  p)  =  /2  l  log 


n  o  i+  a-,  v„,  p.J 


Jk*~k>  W  i 


k=l  i-6(yk,ok:  vk,pR) 


(4.31) 


with 


fCu^-v„)2+2Crr^-p. 


“k.V’k'V'  7 - rz-T— 

(yk_vk)  +2{0 


k+V 


,  k  1 1  ■  •  >  f  n 


(4.32) 


ana ,  where 


^(ul . Vn)>o=(o1,...,an);v=(v1 . V’p=(pl . Pn) 


in  view  of  (4 . 31j-(,4 .  32)  and  (4.9)-  (4.10.)  we 
may  conclude  the  following  desirable  property  of  the 
information  distance: 


^(«.o;v,e)-|  S2llV0„;VPk) 


(4.33) 


5.  THE  CARATHE0DORY  PS EUDO-DI STANCE 


The  information  distance  S(u,o:v,p)  between 
2  2 

w(y,a  )  and  N(v,p  )  given  as  in  (4.9)  suggests 
an  introduction  of  a  pseudo-distance  on  a  theme  of 
Caratheodory  (see  l 4)  for  a  further  generalization), 
We  briefly  discuss  this  possibility  and  refer  the 
reader  to  Burbea  (2,3,43  and  the  book  of  Kobayashi 
[8,  pp.  49-533  for  further  details. 
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We  assume  that  the  family  of  mult iparametric 
probability  distributions  F  is  such  that  ft  is  a  complex 
manifold  m  (En.  Thus  p(  •  |z)cF  with  z=(z±,  .  .  .  ,zn)tfl 
being  an  n-tuple  of  complex  parameters  z.=x.  +  iy.,  l£j<n. 

We  consider  the  Mobius  and  Poincare  distances  6  and  S 
on  the  upper  half-plane  U,  as  given  in  (a, 6 ;-(a.7 ) .  Let 
Hi  ft:U)  denote  the  family  of  holomorphic  functions  from 
ft  into  U.  We  define 

6  C)=sup  (6(f  (.z)  ,f  (4) ) :  f  cH(  ft:U)  }  ;  z  ,Ceft. 

A  normal  family  argument  shows  that  the  supremum  is  attained. 
It  is  also  clear  that  6  satisfies  all  axioms  of  a 
pseudo-distance  on  ft.  It  is  called  the  Mobius  pseudo¬ 
distance  of  ft.  The  Caratheodory  pseudo-distance  of  ft 
is  defined  by 

Sfitz  ,  C)=suptS(f  (z)  f€H(ft:U)}  ;  z,i . 

Again,  the  supremum  is  attained  and  by  (a.6;-(4.7) 

1+6  (z,0 

t»0(z,  C)=/Z  log -  . 

C) 

Both  pseudo-distances  become  distances  on  ft  when  ft  is 
biholomorphically  equivalent  to  a  bounded  domain  in  <Cn. 

It  is  also  clear  that 

srf.,{)>2/5i||(z,{),  Oisa(z,u<1 ;  z,Ce"  • 

* 

Let  <j>:Q+ft  be  a  holomorphic  mapping  of  a  complex 
manifold  ftofdn  into  another  complex  manifold  ft  of  Cm. 


Then ,  for  z  ,  SeQ, 


Sfi*(4>(z)  ,4>U))<6  ^(z ,C) 

and 

S^(4Kz),<{)(0)<Sn(z,c). 

In  particular,  6^  and  are  binolomorphicaliy  invariants. 
Also,  in  the  case  that  fl  is  the  upper  half-plane  U,  we  have 


and,  therefore,  6^  ana  constitute  a  natural  gen¬ 
eralization  of  6  and  a  in  (4.6)-(4.7) 

when  fi=Un  we  have,  contrary  to  (4.33). 


«  n(z,C)=max{6(z. , ^ $(zn , r  )  } 

yii  i  -*■  n  n 

and,  therefore, 

,  C )=max{ S(z^ , > , . . . ,S(zn,Cn)  ) 
where  z-(z± , . . .  ,zn) ,  c=( ^ , . . . , Cn)eUn. 


References 


[1]  AtKinson,  C.  and  Mitchell,  A.  F.  S.  11980).  Rao's 
distance  measure.  Sankhya,  (  in  press). 

f.2)  Burbea,  J.  (1977).  The  Caratheodory  metric  and  its 
majorant  metrics.  Canad.  J.  Math.,  29,  771-780. 

[3]  Burbea,  J.  (1980).  A  generalization  of  Pick's  theorem 

and  its  applications  to  intrinsic  metrics.  Annal . 
Polon.  Math.,  39,  (in  press). 

[4]  burbea,  J.  (19»0).  On  metrics  and  distortion 

tneorems.  Annals. Math.  Studies,  120  (in  press). 

[5]  Burbea,  J.  and  Rao.  C.  R.  (1080).  On  the  convexity 

of  some  divergence  measures  based  on  entroDy 
functions.  Tech.  Rep.  No.  80-13,  University  of 
Pittsburgh. 

[6]  Fisher,  R.  A.  (1025).  Theory  of  statistical 

estimation.  Proc.  Camb.  Phil.  Hoc.,  2z ,  7u0-'/25. 

[7]  Havrda,  M.  K.  and  CharvSt ,  F.  (1967).  Quantification 

method  of  classication  processes:  Concept  of 
structural  a-entropy.  Kybernetica,  3,  30-35. 

[8]  Kobayashi,  S.  (1970).  Hyperbolic  Manifolds  and 

Holomorphic  Mappings.  Marcel  Pekker ,  New  York. 

T9]  Kuilback,  S.  an^  Leibier,  R.  A.  (l9ol).  On  information 
and  sufficiency.  Ann.  Math.  Statist.,  2z ,  79-86. 

[10]  Matusita,  K.  (1955).  Decision  rules  based  on  the 

distance,  for  problem  of  fit,  two  samples  and 
estimation.  Ann,  Math.  Statist.,  26,  631-640. 

[11]  Matusita,  K.  (1957).  Decision  rule  based  on  tne 

distance  for  the  classification  problem.  Ann . 

Inst.  Statist.  Math.,  8,  67-77. 

[i£]  Pitman ,  -F„  J.  G.  (l97y).  some  Basic  Theory  for 
Statistical  Inference.  Raised  Press . 

[13]  Rao,  C.  R.  (Iy45).  Information  and  accuracy 

attainable  in  the  estimation  of  statistical  para¬ 
meters.  Bull.  Calcutta  Matn.  Soc. ,  37,  81-91. 

[14]  Rao,  C.  R.  (1949).  On  the  distance  between  two 

populations.  Sankhya ,  9,  246-248. 


References 


tl5]Kao,  C.  R.  (1962).  Efficient  estimates  and 

optimum  inference  procedures  in  large  samples 
(with  discussion).  J.  Roy.  Statist.  Soc .  B. , 

24,  46-72. 

[16]  Rao,  C.  R.  (1973).  Linear  Statistical  Inference  and 

its  Applications.  John  Wiley,  wew  York. 

[17]  Rao,  C.  R.  (1980).  Diversity  and  dissimilarity 

coefficients:  a  unified  approach.  Tech.  Hep. 
80-10,  University  of  Pittsburgh. 

[18]  Shannon,  C.  E.  (1948).  A  mathematical  theory  of 

communications.  Bell  System  Tech.  J. ,  27, 

379-423, 


